Modeling Heterogeneity: Fixed Effect models vs Random Coefficient Models

In continuation of looking at heterogenous variance modeling, random coefficient models are a specific type of mixed effect model that allow richness in modeling the variance through random effects. This is adding to the toolbox that we’ve built upon from last week in we looked at modeling the covariance matrix directly. Since we are introducing two “angles” of attack and giving structure to the variance \(Var(Y)\), SAS commonly refers to this method of covariance modeling through random effects as “G”-side. Modeling the covariance directly as we did last week is referred to as “R”-side for “residuals”. The terminology becomes more clear with the framework:

\[ \begin{aligned} Y = X\beta + Zu + e \end{aligned} \]

where:

  • \(u\sim N(0,G)\)
  • \(\varepsilon \sim N(0, R)\)

\[ \begin{aligned} Var(Y) &= Z\underbrace{\operatorname{Var}(u)}_GZ' + \underbrace{\operatorname{Var}(e)}_{R} \\ &= \Sigma \end{aligned} \] Note we can also define the following variances when additional random effects are present, such that:

  • \(\operatorname{Var}(Y | u) = R\) “conditional variance”
  • \(\operatorname{Var}(Y) = ZGZ' + R\) “marginal (individual) variance”
  • \(\operatorname{Var}(u) = G\) “random effect variance”

These are directly extractable through lme with getVarCov(obj, type = c("random.effects", "conditional", "marginal")).

In the fixed effects models, we have

\[ \begin{aligned} Y = X\beta + \varepsilon \end{aligned} \]

where:

  • \(\varepsilon \sim N(0,\Sigma)\)

Again, we have that \(\operatorname{Var}(Y) = \Sigma\). This is exclusively “R”-side modeling when the \(ZGZ'\) matrix is 0, which is the case when we don’t have any random effects.

Introducing Autism Data (part b)

autism <- read.csv("data/autism.csv") %>% 
  mutate(sicdegp = factor(sicdegp),
         childid = factor(childid),
         agef = factor(age)) # if we choose to model age as a factor 

autism_complete <- autism[complete.cases(autism),]

autism %>% ggplot(aes(age, vsae, group = childid, color = sicdegp)) +
  geom_line(alpha = as.numeric(autism$childid)^2 / 30000) + # hack to emphasize just a few response curves
  facet_wrap(~sicdegp)
## Warning: Removed 1 row(s) containing missing values (geom_path).

In context of this dataset, we note some research questions in particular that we are interested in modeling:

  • The age trend seems different among sicdegp levels.
    • I’d like to quantify the differences in these trends, and pick models that make this comparison easier to interpret.
  • The age trend itself seems mostly increasing linear, but each child seems to have their own linear trajectory.
    • There is perhaps an sharper inflection upwards around age 9. It would be worthwhile testing models that inflect upward as age increases.
  • There seem to be a greater variability in the slope for sicdegp = 2,3 than in sicdegp = 1.
    • There is fanning present from variability in slopes for each child, but the diversity among those slopes seem to differ as well.
    • We should make another plot that more directly shows/quantifies the extent of this. This also helps inform whether or not trying heterogenous models would be empirically necessary.
  • There are a non-trivial number of individuals that show “drop out” before the end of the study.
    • since the data doesn’t show the incompleteness, we should make the missingness “explicit” and try to quantify the extent that this missingness may bias our results.

Fixed Effect Modeling (part c)

In this section, we try to “push” fixed effects as far as we can by modeling the mean and covariance (R-side) and see how good of a model we can get. We’ll first focus on modeling the mean.

Modeling the Mean

From the exploratory plots, we loosely fit linear/quadratic/stick models to the data.

# without child id
autism_lm <- lm(vsae ~ age*sicdegp, data = autism_complete) 

# with child id, varying intercept
autism_lm_id <- lm(vsae ~ age*sicdegp + childid, data = autism_complete)

# with child id varying slope and intercept
# Note: these estimates are the same as if we subset dataset to just the single child and ran lm. 
# sum(coef(autism_lm_id_slope)[c("(Intercept)", "childid10")])
# sum(coef(autism_lm_id_slope)[c("age", "age:childid10")])
# coef(lm(vsae~age, data = subset(autism, childid == 10)))
autism_lm_id_slope <- lm(vsae ~ age + childid + childid:age, data = autism_complete)


# stick models
autism_stick <- autism_complete %>% 
  mutate(age_gt_9 = as.numeric(age > 9), # create indicator for stick models
         age_relu_9 = age_gt_9 * (age - 9)) # ReLu function for adding slope above 9

autism_lm_stick <- lm(vsae ~ age*sicdegp + age_relu_9:sicdegp, data = autism_stick) # without child id, "pooled"
autism_lm_stick_id <- lm(vsae ~ age*sicdegp + age_relu_9:sicdegp + childid, data = autism_stick) # without child id, "pooled"
autism_lm_stick_id_slope <- lm(vsae ~ age + childid + age:childid + age_relu_9:childid, data = autism_stick) # individual stick model

# quadratic trend
autism_lm_quad <- lm(vsae ~ age*sicdegp + I(age^2):sicdegp, data = autism_complete)
autism_lm_quad_id <- lm(vsae ~ age*sicdegp + I(age^2):sicdegp + childid, data = autism_complete) # id specific intercept
autism_lm_quad_id_slope1 <- lm(vsae ~ age*childid + I(age^2):sicdegp, data = autism_complete) # id specific intercept and linear terms
autism_lm_quad_id_slope2 <- lm(vsae ~ age*childid + I(age^2)*childid, data = autism_complete) # id specific intercept, linear and quadratic


# using poly macro
# autism_lm_poly2 <- lm(vsae ~ poly(age, degree = 2) * sicdegp, data = autism)
# autism_lm_poly2 <- lm(vsae ~ poly(age, degree = 2) * child_id, data = autism)
# predictions, plotted raw
autism_predict <- autism %>% filter(complete.cases(.)) %>% 
  add_column(
  yhat_lm = predict(autism_lm),
  yhat_lm_id = predict(autism_lm_id),
  yhat_lm_id_slope = predict(autism_lm_id_slope),
  yhat_lm_stick = predict(autism_lm_stick),
  yhat_lm_stick_id_slope = predict(autism_lm_stick_id_slope),
  yhat_lm_quad = predict(autism_lm_quad),
  yhat_lm_quad_id = predict(autism_lm_quad_id),
  yhat_lm_quad_id_slope1 = predict(autism_lm_quad_id_slope1),
  yhat_lm_quad_id_slope2 = predict(autism_lm_quad_id_slope2),
  )

autism_predict %>% 
  pivot_longer(cols = c(vsae, starts_with("yhat")),
                                names_to = "type",
                                values_to = "y") %>% 
  arrange(childid, age) %>% 
  ggplot(aes(age, y, group = childid, color = type)) +
  geom_line(alpha = .4) +
  facet_grid(type~sicdegp)

  • The stick model estimates look a little funky because there are some individuals with only observations at age = 2, 13, thus a stick model would be degenerate in those individuals and the plotting is simply showing the direct line instead of the stick estimates. I think the same thing is happening in the id quad estimates.
autism_fixed_ic <- AIC(autism_lm,
    autism_lm_id,
    autism_lm_id_slope,
    autism_lm_stick,
    autism_lm_stick_id_slope,
    autism_lm_quad,
    autism_lm_quad_id,
    autism_lm_quad_id_slope1,
    autism_lm_quad_id_slope2) %>% 
  add_column(
    BIC = BIC(autism_lm,
              autism_lm_id,
              autism_lm_id_slope,
              autism_lm_stick,
              autism_lm_stick_id_slope,
              autism_lm_quad,
              autism_lm_quad_id,
              autism_lm_quad_id_slope1,
              autism_lm_quad_id_slope2)$BIC)

autism_fixed_ic %>% 
  kbl(format = "html",
      caption = "Fixed Model Mean Information Criteria",
      table.attr = "style='width:50%;'") %>% 
  kable_classic(full_width = TRUE) %>% 
  column_spec(3, color = c("black", "red")[as.numeric(autism_fixed_ic$AIC == min(autism_fixed_ic$AIC)) + 1]) %>% 
  column_spec(4, color = c("black", "red")[as.numeric(autism_fixed_ic$BIC == min(autism_fixed_ic$BIC)) + 1])
Fixed Model Mean Information Criteria
df AIC BIC
autism_lm 7 5538.140 5569.035
autism_lm_id 162 5440.346 6155.327
autism_lm_id_slope 315 4553.386 5943.626
autism_lm_stick 10 5538.678 5582.813
autism_lm_stick_id_slope 408 4153.635 5954.326
autism_lm_quad 10 5539.403 5583.538
autism_lm_quad_id 165 5440.750 6168.970
autism_lm_quad_id_slope1 318 4542.668 5946.148
autism_lm_quad_id_slope2 457 4237.499 6254.450

The information criteria here show something very interesting! AIC chooses one of the most complex models, with 408 parameters, while BIC chooses the model with 7 parameters! How you go about model selection is very much a philosophical decision, and the information criteria tend to reflect those camps of thinking. I’ll be moving forward with autism_lm.

Diagnostics

We can use the diagnostic plots to examine missing trends and start to get a sense of the variance modeling we will need.

Simple Linear Model

par(mfrow = c(2,2))
plot(autism_lm)

Individual Stick Model

par(mfrow = c(2,2))
plot(autism_lm_stick_id_slope) # danger with this model is that there are a number of points with leverage one.

Based on these plots, we see that when we choose the super parameterized models, we are risking overfitting with a number of values with leverage 1.

I would ultimately base modeling decisions on specific research questions the scientist had. If there was greater interest in ages 10 - 13, I may be more hesitant to choose the stick model because it’s quite overfit in this domain. If they wanted to summarize “rules of thumb” for presentation or big picture of this trait, I’d likely choose the simple linear model. If they were hoping for prediction applications based on younger children and valued accuracy of predictions, I may opt for the over-parameterized models like quadratic or stick model.

Modeling the variance

For simplicity, we’ll chose the linear functional form, as it is quite simple, and performs reasonably well for the number of parameters that it uses. Note, obviously the variance modeling will depend on your chosen mean model.

Similar to how we modeled the mean, we start with trying to visualize the covariance matrix with as few restrictions as possible, so we fit an unstructured covariance matrix with gls.

# start with unstructured covariance matrix, and let the optimizer tell us the best estimate with no restrictions.
autism_gls_un <- gls(vsae~age*sicdegp,
                     correlation = corSymm(form = ~1 | childid), # unstructured correlation
                     weights = varIdent(form = ~1 | agef), # parameter for each entry along diagonal
                     data = autism_complete)
getVarCov(autism_gls_un)
## Marginal variance covariance matrix
##         [,1]     [,2]     [,3]    [,4]     [,5]
## [1,] 10.8500   8.9204   9.4352   25.49   48.503
## [2,]  8.9204  55.6110  45.2460  108.23  203.970
## [3,]  9.4352  45.2460 135.6600  223.11  404.000
## [4,] 25.4900 108.2300 223.1100  765.99 1206.800
## [5,] 48.5030 203.9700 404.0000 1206.80 2319.400
##   Standard Deviations: 3.2939 7.4573 11.647 27.677 48.16

There are some visual guides that we can use to help us model the covariance:

  1. line graph of matrix entries, grouped by row
  2. Variogram
  3. Autocorrelation Function

1. Line Graph of matrix entries, grouped by row

I picked up this visualization from Generalized Linear Mixed Models by Walter Stroup, one of the authors of the SAS for Mixed Models book. I like this visualization better than a heat map because instead of using a color channel for the variance, it uses y-position which is much more clear.

sigmahat <- getVarCov(autism_gls_un)

sigmahat_df <- data.frame(row = rep(1:5, each = 5),
           col = rep(1:5, 5),
           age = rep(c(2, 3, 5, 9, 13), 5), # un
           cov = c(sigmahat)) %>% 
  filter(row <= col) # pull upper triangle w/ diagonal entries

sigmahat_df %>% 
  ggplot(aes(col, cov, color = factor(row), group = factor(row))) +
  geom_point() +
  geom_line()

This graphic is confusing at first, but I think it’s one of the more intuitive visualizations once you’re used to it. each point in the graph is the estimated covariance/variance. The lines show the trend that is happening in each row, as you move away from the main diagonal. For example, the red line is the first row of \(\hat \Sigma\). There are 5 dots because the first row starts in column 1. The main diagonal (all the variances) are the left most point in each of the trend lines.

We see a similar pattern of the estimated covariances increasing as the age increases. column 5 represents age 13, and we can see the estimated variance is ~2500.

Since age has a meaning on a continuous scale, it would be slightly more helpful to visualize the appropriate distances in the x axis when looking at the estimated covariances.

sigmahat_age_plot <- sigmahat_df %>%
  mutate(diag = col - row) %>% 
  ggplot(aes(age, cov, color = factor(row), group = factor(row))) +
  geom_point() +
  geom_line()
sigmahat_age_plot

We can see some resemblance of a power relationship for the main diagonal, and there is a pretty clean pattern in the covariance here, so it’s likely we can capture most of the trends with just a few extra parameters. I would like to graphically test what i’m thinking, so i’ll add a smoothed parametric fit to the left most points of each colored trend (main diagonal).

# hackish way to checking how a power relationship on age might fit for the heterogenous variance function
sigmahat_age_plot + 
  geom_line(stat = "smooth",
            method = "lm",
              aes(group = diag),
              formula = (y~ I(x^3)), # can play with this form to visualize how the variance estimates might look. Try exponential here, doesn't fit well! (so AR covariance models probably inappropriate)
              se = FALSE,
              size = .2,
              linetype = 2,
              alpha = .4,
              color = "black") 

The main diagonal band here seems like a pretty decent fit, which is primarily what I’m interested in here. The fact that the off-diagonal bands also fit this power relationship pretty well implies that I can probably get away with a fairly simple covariance matrix structure (few parameters for off diagonals, like compound symmetry) and still describe this covariance matrix quite well.

2. Semi-(Variogram)

This is another method of visualizing covariances that is popular in spatial statistics. This is more useful when one (or more) of your datapoints have a continuous/spatial interpretation. In this case, we are interested in how the correlation of observations between ages as the distance in age increases.

The semi-variogram is defined as: (for isotropic and stationary processes)

\[ \begin{aligned} \gamma(h) &= \frac{1}{2}Var(\epsilon_{age} - \epsilon_{age+h}) \\ &= \frac{1}{2}E(\epsilon_{age} - \epsilon_{age+h})^2 \end{aligned} \]

Empirically to estimate this, we take pairs of observations certain distances apart, and average the squared distances of the estimated residuals.

vario_autism <- Variogram(autism_gls_un, form = ~ age | childid, resType = "response")
vario_autism
##      variog dist n.pairs
## 1  22.25863    1     147
## 2  41.44491    2      86
## 3  53.94080    3      91
## 4 303.74329    4     153
## 5 275.86605    6     115
## 6 382.44231    7     118
## 7 707.25001    8      49
## 8 841.57324   10      91
## 9 927.39613   11      93

We have oddly spaced age observations, age = c(2, 3, 5, 9, 13), so in this case, it’s pretty distinctive that our estimate of \(dist = 1\) come from \(age = 2,3\), and \(dist = 8\) comes from \(age = 5, 13\).

plot(vario_autism)
Figures from Mixed Effects Models

3. Auto Correlation Functions

(sample) auto correlation functions show the correlation with lagged versions of itself.

\[ \begin{aligned} ACF(k) = \frac{\operatorname{Cov}(\varepsilon, \varepsilon_{lag(k)})}{\operatorname{Var}(\varepsilon)} \end{aligned} \]

The ACF is most useful when we have data that is more time series like, with many measured timepoints and somewhat equally spaced measurements because we’re calculating correlation with itself.

ACF(autism_gls_un, form = ~ age | childid, resType = "response")
##   lag        ACF
## 1   0 1.00000000
## 2   1 0.46705931
## 3   2 0.19978821
## 4   3 0.11193700
## 5   4 0.09748354
plot(ACF(autism_gls_un, form = ~ age | childid, resType = "response"), alpha = .01) # observed - fitted, not accounting for covariance estimates

We can ignore the first bar, but we’re looking at the trend made by the top of the bars. An AR1 model would have the tops of the bars decrease exponentially quickly. The fact that the bars poke out from the dotted alpha = .01 curve means that there’s likely some sequential correlation happening in our data that we haven’t accounted for.

plot(ACF(autism_gls_un, form = ~ age | childid, resType = "normalized"), alpha = .01) # accounting for our estimated covariance matrix

If we plot the ACF of residuals that account for our unstructured covariance matrix, we can see the sequential correlations drop out of significance.

Fitting variance models

There are many covariance shapes we can try here, and SAS has even more! See SAS Repeated Statement.

# heterogeneous, diagonal
autism_gls_het <- gls(vsae~age*sicdegp,
                      weights = varIdent(form = ~ 1 | agef),
                      data = autism_complete)

# should be a bad fit since we already know there is strong heterogeneity
autism_gls_cs <- gls(vsae~age*sicdegp,
                     correlation = corCompSymm(form = ~1 | childid),
                     data = autism_complete)

# heterogeneous, compound symmetry
autism_gls_csh <- gls(vsae~age*sicdegp,
                     correlation = corCompSymm(form = ~1 | childid),
                     weights = varIdent(form = ~ 1 | agef),
                     data = autism_complete)

# continuous autoregressive
# correlation = \phi^distance
autism_gls_carh <- gls(vsae~age*sicdegp,
                      correlation = corCAR1(form = ~ age | childid), # i.e. exponential decay with distance
                      weights = varIdent(form = ~1 | agef),
                      data = autism_complete)

# based on the visualizations, should be decent
# variance = age^\theta
autism_gls_pow <- gls(vsae~age*sicdegp,
                      correlation = corCompSymm(form= ~1 | childid),
                      weights = varPower(form = ~age), # fit heterogeneous variance function
                      data = autism_complete)

autism_gls_cpow <- gls(vsae~age*sicdegp,
                      correlation = corCompSymm(form= ~1 | childid),
                      weights = varConstPower(form = ~age), # additional constant variable to power relationship
                      data = autism_complete)

# with linear paramterization of corstruct
autism_gls_powlin <- gls(vsae~age*sicdegp,
                      correlation = corLin(form= ~age | childid),
                      weights = varPower(form = ~age), # fit heterogeneous variance function
                      data = autism_complete)
autism_gls_ic <- AIC(
    autism_gls_het,
    autism_gls_cs,
    autism_gls_csh,
    autism_gls_carh,
    autism_gls_un,
    autism_gls_pow,
    autism_gls_cpow,
    autism_gls_powlin) %>% 
  add_column(
    BIC = BIC(autism_gls_het,
              autism_gls_cs,
              autism_gls_csh,
              autism_gls_carh,
              autism_gls_un,
              autism_gls_pow,
              autism_gls_cpow,
              autism_gls_powlin)$BIC)

autism_gls_ic %>% kbl(caption = "Information Criteria for ") %>% 
  kable_classic(full_width = F) %>%
  column_spec(3, color = c("black", "red")[as.numeric(autism_gls_ic$AIC == min(autism_gls_ic$AIC)) + 1]) %>% 
  column_spec(4, color = c("black", "red")[as.numeric(autism_gls_ic$BIC == min(autism_gls_ic$BIC)) + 1])
Information Criteria for
df AIC BIC
autism_gls_het 11 4661.843 4710.282
autism_gls_cs 8 5461.844 5497.072
autism_gls_csh 12 4529.853 4582.696
autism_gls_carh 12 4604.352 4657.195
autism_gls_un 21 4482.384 4574.859
autism_gls_pow 9 4536.168 4575.800
autism_gls_cpow 10 4538.168 4582.203
autism_gls_powlin 9 4654.410 4694.043

At this stage, it seems the unstructured covariance matrix would fit the best if we follow either AIC or BIC, though notably the power variance structure has a very comparable BIC with only 9 parameters vs 21 parameters total. Since I’ve already abided by principles of parsimony in choosing to use the very simple linear model, I’d probably elect to use the autism_gls_pow as my final model for fixed effects modeling for the same reasons. The model is also very appealing for explaining variance as simply a power function of age (respecting the continuous scale) and similar performance in information criteria.

Final Model

plot(autism_gls_pow)

spread among the residuals looks much better against the fitted values.

Some basic inference from the fixed effect model…

intervals(autism_gls_pow) # asymptotic normal approximation, based on inverse hessian
## Approximate 95% confidence intervals
## 
##  Coefficients:
##                   lower       est.      upper
## (Intercept)  -0.5248735  1.2011843  2.9272421
## age           2.2896001  3.1507950  4.0119900
## sicdegp2     -1.9420685  0.3411552  2.6243789
## sicdegp3     -5.7898786 -3.2664553 -0.7430321
## age:sicdegp2 -0.5218454  0.6171416  1.7561285
## age:sicdegp3  3.0205486  4.2837888  5.5470290
## 
##  Correlation structure:
##         lower      est.     upper
## Rho 0.3682194 0.4549824 0.5387205
## 
##  Variance function:
##          lower     est.    upper
## power 1.217197 1.291734 1.366271
## 
##  Residual standard error:
##    lower     est.    upper 
## 1.372947 1.575796 1.808615

Weighted Least Squares

I mentioned weighted least squares as a very simple quick fix that give similar qualities of inference as you’d get with more sophisticated fitting procedures (increased variance for larger ages). You can see here that it works decently, but the fact that we’re not accounting for measurements from individuals (correlation) means that we can do much better.

gls_un_var <- getVarCov(autism_gls_un, individual = 2) %>% diag()

autism_wls <- lm(vsae~age*sicdegp, weights = 1/(gls_un_var[autism$agef]), data = autism)

AIC(autism_wls)
## [1] 4655.256
BIC(autism_wls)
## [1] 4686.151

I believe comparing the information criteria in WLS to GLS is legitimate? but I’m not certain… the former situation we’re assuming variance parameters are known which makes this comparison weird.

autism_new <- expand.grid(age = 2:13, sicdegp = factor(1:3))

# from direct lm
autism_new %>% bind_cols(predict(autism_lm, newdata = autism_new, interval = "confidence")) %>% pivot_longer(cols = fit:upr, names_to = "type") %>% 
  ggplot(aes(age, value, linetype = type, group = type)) +
  geom_line() + 
  facet_wrap(~sicdegp)

# from weighted least squares
autism_new %>% bind_cols(predict(autism_wls, newdata = autism_new, interval = "confidence")) %>% pivot_longer(cols = fit:upr, names_to = "type") %>%
  ggplot(aes(age, value, linetype = type, group = type)) +
  geom_line() + 
  facet_wrap(~sicdegp)

Mixed Effect Modeling (part d)

We’re working with a different toolbox now, as far as covariance modeling goes. We can now control the covariance matrix directly (R-side) as we did above or implicitly through the use of random effects/coefficients (G-side).

The advantages:

  1. aligns parameter interpretation with the random design matrix \(Z\), which is more intuitive.
  2. Allows for rather complex covariance structures with very few parameters
  3. implicit averaging of individual models, and we can pull out the individually fit

The disadvantages:

  1. Harder to visualize what’s happening with G-matrix, and effect on final covariance matrix of Y.
  2. More computationally intensive, will probably be slower for larger datasets.

The reason I recommend lme over lmer is that you have more control over the structure of “G” and “R” matrices in lme. lmer is only capable of fitting diagonal and unstructured covariances for G, and homogenous diagonal matrix for “R”. (And i recommend SAS over lme because the syntax is much easier!)

Fitting Models

# random intercept
autism_lme_id <- lme(fixed = vsae~sicdegp*age,
    random = ~ 1 | childid,
    data = autism_complete)

# random slopes model
# doesn't converge!!
autism_lme_id_slope <- lme(fixed = vsae~sicdegp*age,
    random = ~ age | childid,
    data = autism_complete)
## Error in lme.formula(fixed = vsae ~ sicdegp * age, random = ~age | childid, : nlminb problem, convergence error code = 1
##   message = iteration limit reached without convergence (10)

We hit some optimizer problems trying to fit the random slopes model. By default, lme uses the outdated nlminb optimizer, which is similar to “BFGS”, a quasi-newton optimization routine. 1. It’s mostly used for compatibility reasons, and optim is the general optimizer that is now preferred. lmeControl has the option opt = "optim", which switches the optimizer, and now looks for optimMethod = "BFGS" which says to run BFGS algorithm in optim.

We can also switch the function call to lmer, because this is a model that can be handled in that library as well. The default callback is lmer -> nloptwrap (wrapper function) -> nloptr (R interface into NLopt) -> NLopt (Free/Open Source library for Nonlinear optimization) -> NLOPT_LN_BOBYQA (BOBYQA routine written in C). BOBYQA is a derivative free optimization program.

I try to avoid diving down the optimizer rabbit hole as much as possible… Fix 2 is normally the route I take, if you’re curious, in which I fit successively simpler models until boundary issues don’t exist.

# Fix 1: change the optimizer to "optim" (BFGW) in lme
autism_lme_id_slope <- lme(fixed = vsae~sicdegp*age,
    random = ~ age | childid,
    data = autism_complete,
    control = lmeControl(opt = "optim"))
# summary(autism_lme_id_slope) # note correlation of random effects is _very_ close to boundary -1 (even though fits with no complaints)

# Fix 2: change optimizer to use ------------------
# lmer fits, but warns about boundary...
autism_lmer_id_slope <- lmer(vsae~sicdegp*age + (age | childid), data = autism_complete)
## boundary (singular) fit: see help('isSingular')
# looking at summary, we see that correlation of random effects is -1 (boundary)
# summary(autism_lmer_id_slope)

# A useful function is "allFit", which tries to fit the model with "all" appropriate optimizers.
# they all have boundary warnings.
# allFit(autism_lmer_id_slope)

# try the uncorrelated model, still boundary w/ intercept variance estimated as 0.
autism_lmer_id_slope_nocor <- lmer(vsae~sicdegp*age + (age || childid), data = autism_complete)
## boundary (singular) fit: see help('isSingular')
# summary(autism_lmer_id_slope_nocor)

# take out random intercept, finally no boundary estimates, and no warnings!
autism_lmer_id_slope_noint <- lmer(vsae~sicdegp*age + (0 + age | childid), data = autism_complete)

Based on these issues, I’m skeptical the optimizer will be able to handle more complicated random coefficient models reliably. But we’ll try! We’ll try to fit the same gamut of linear/quadratic/stick models that we had fit in the fixed case. Since lmer is likely more familiar, I show these fits in lmer first, and the lme equivalent underneath.

### Fitting in lmer ---------------------------------------------------------------------
## Quadratic Models
autism_lmer_quad_id <- lmer(vsae ~ age*sicdegp + I(age^2):sicdegp + (1 | childid), # random intercept
                            data = autism_complete) # no warnings

# quadratic, with random linear term.
autism_lmer_quad_id_slope1 <- lmer(vsae~age + I(age^2):sicdegp + (1 + age | childid), # random intercept, linear
                                   data = autism_complete) # boundary warning
## boundary (singular) fit: see help('isSingular')
autism_lmer_quad_id_slope1_nocor <- lmer(vsae~age + I(age^2):sicdegp + (1 + age || childid), # random intercept, linear, uncorrelated
                                   data = autism_complete) # boundary warning
## boundary (singular) fit: see help('isSingular')
autism_lmer_quad_id_slope1_noint <- lmer(vsae~age + I(age^2):sicdegp + (0  + age | childid), # random linear, no intercept
                                   data = autism_complete) # no warnings

# Quadratic: unstructured, heterogenous G matrix, diagonal, homogenous R
autism_lmer_quad_id_slope2 <- lmer(vsae ~ age*sicdegp + I(age^2):sicdegp + (1 + age + I(age^2) | childid), data = autism_complete) # singular, corr = -1
## boundary (singular) fit: see help('isSingular')

# Quadratic: diagonal, heterogenous G matrix, diagonal, homogenous R
autism_lmer_quad_id_slope2_nocor <- lmer(vsae ~ age*sicdegp + I(age^2):sicdegp + (1 + age + I(age^2) || childid), data = autism_complete) # singular, intercept ≈ 0
## boundary (singular) fit: see help('isSingular')


# Quadratic, no intercept: unstructured, heterogenous G matrix, diagonal, homogenous R
autism_lmer_quad_id_slope2_noint <- lmer(vsae ~ age*sicdegp + I(age^2):sicdegp + (0 + age + I(age^2) | childid), data = autism_complete, 
                                         control = lmerControl(optimizer = "Nelder_Mead")) # no warnings

# Quadratic, no intercept: diagonal, heterogenous G matrix, diagonal, homogenous R
# autism_lmer_quad_id_slope2_noint_nocor <- lmer(vsae ~ age*sicdegp + I(age^2):sicdegp + (0 + age + I(age^2) || childid), data = autism_complete) # fail to converge
autism_lmer_quad_id_slope2_noint_nocor <- lmer(vsae ~ age*sicdegp + I(age^2):sicdegp + (0 + age + I(age^2) || childid), data = autism_complete, 
                                         control = lmerControl(optimizer = "Nelder_Mead")) # no warnings

## Stick Models
autism_lmer_stick_id <- lmer(vsae ~ age*sicdegp + age_relu_9:sicdegp + (1 | childid), data = autism_stick) 
autism_lmer_stick_id_slope <- lmer(vsae ~ age*sicdegp + age_relu_9:sicdegp + (0 + age + age_relu_9 | childid), data = autism_stick)
# Fitting in lme ---------------------------------------------------------------------
# Quadratic, random intercept: diagonal, homogenous R
autism_lme_quad_id <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp, # quadratic 
                       random = ~ 1 | childid, # random intercept
                       data = autism_complete,
                       control = lmeControl(opt = "optim"))


# Quadratic, random linear: unstructured, heterogenous G matrix, diagonal, homogenous R
autism_lme_quad_id_slope1 <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       random = ~ age | childid, # random intercept and linear term
                       data = autism_complete,
                       control = lmeControl(opt = "optim"))

# Quadratic, random linear: diagonal, heterogenous G matrix:  diagonal, homogenous R
autism_lme_quad_id_slope1_nocor <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       random = list(childid = pdDiag(form = ~ age)), # random intercept and linear term, no correlation
                       data = autism_complete,
                       control = lmeControl(opt = "optim"))

autism_lme_quad_id_slope1_noint <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       random = list(childid = pdDiag(form = ~ 0 + age)), # random intercept and linear term, no intercept
                       data = autism_complete,
                       control = lmeControl(opt = "optim"))


# Quadratic, random quadratic: unstructured, heterogenous G matrix:  diagonal, homogenous R
autism_lme_quad_id_slope2 <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       random = ~ age + I(age^2) | childid, # random intercept, linear and quadratic term
                       data = autism_complete,
                       control = lmeControl(opt = "optim"))

# Quadratic, random quadratic: diagonal, heterogenous G matrix:  diagonal, homogenous R
autism_lme_quad_id_slope2_nocor <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       random = list(childid=pdDiag(form = ~ age + I(age^2))), # random intercept, linear and quadratic term, no correlation
                       data = autism_complete,
                       control = lmeControl(opt = "optim"))

# Quadratic, random quadratic (no int): unstructured, heterogenous G matrix:  diagonal, homogenous R
autism_lme_quad_id_slope2_noint <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       random = list(childid=pdSymm(form = ~ 0 + age + I(age^2))), # Unstructured G
                       data = autism_complete,
                       control = lmeControl(opt = "optim"))

# Quadratic, random quadratic (no int): diagonal, heterogenous G matrix:  diagonal, homogenous R
autism_lme_quad_id_slope2_noint_nocor <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       random = list(childid=pdDiag(form = ~ 0 + age + I(age^2))),
                       data = autism_complete,
                       control = lmeControl(opt = "optim"))

## Stick models
autism_lme_stick_id <- lme(vsae ~ age*sicdegp + age_relu_9:sicdegp,
                       random = ~ 1 | childid, # random intercept
                       data = autism_stick,
                       control = lmeControl(opt = "optim"))
autism_lme_stick_id_slope <- lme(vsae ~ age*sicdegp + age_relu_9:sicdegp,
                       random = ~ 0 + age + age_relu_9 | childid, # random linear and quadratic term, no intercept
                       data = autism_stick,
                       control = lmeControl(opt = "optim"))
# see predicted values from some random effect models
autism_random_predict <- autism_complete %>% add_column(yhat_lme_id = predict(autism_lme_id),
                                 yhat_lme_id_slope = predict(autism_lme_id_slope),
                                 yhat_lme_quad_id = predict(autism_lme_quad_id),
                                 yhat_lme_quad_id_slope1 = predict(autism_lme_quad_id_slope1),
                                 yhat_lme_quad_id_slope2 = predict(autism_lme_quad_id_slope2),
                                 yhat_lme_stick_id = predict(autism_lme_stick_id),
                                 yhat_lme_stick_id_slope = predict(autism_lme_stick_id_slope),
                                 yhat_lmer_quad_id_slope2_noint = predict(autism_lmer_quad_id_slope2_noint),
                                 yhat_lmer_stick_id = predict(autism_lmer_stick_id),
                                 yhat_lmer_stick_id_slope = predict(autism_lmer_stick_id_slope))

autism_random_predict %>% pivot_longer(cols = c(vsae, starts_with("yhat")),
                                names_to = "type",
                                values_to = "y") %>% 
  arrange(childid, age) %>% 
  ggplot(aes(age, y, group = childid, color = type)) +
  geom_line(alpha = .4) +
  facet_grid(type~sicdegp)

autism_lme_ic <- AIC(
  autism_lme_id,
  autism_lme_id_slope, # warning here too, probably because includes individuals that 
  autism_lme_quad_id,
  autism_lme_quad_id_slope1,
  autism_lme_quad_id_slope2,
  autism_lmer_quad_id_slope2, # lmer version, get warning when included, not quite sure why, maybe refit as ML? but very close to lme counterpart
  autism_lme_quad_id_slope2_nocor,
  autism_lmer_quad_id_slope2_nocor, # lmer version
  autism_lme_quad_id_slope2_noint,
  autism_lmer_quad_id_slope2_noint, # lmer version
  autism_lme_quad_id_slope2_noint_nocor,
  autism_lme_stick_id,
  autism_lme_stick_id_slope
    ) %>% 
  add_column(
    BIC = BIC(
      autism_lme_id,
    autism_lme_id_slope,
    autism_lme_quad_id,
    autism_lme_quad_id_slope1,
    autism_lme_quad_id_slope2,
    autism_lmer_quad_id_slope2, # lmer version
    autism_lme_quad_id_slope2_nocor,
    autism_lmer_quad_id_slope2_nocor, # lmer version
    autism_lme_quad_id_slope2_noint,
    autism_lmer_quad_id_slope2_noint, # lmer version
    autism_lme_quad_id_slope2_noint_nocor,
    autism_lme_stick_id,
    autism_lme_stick_id_slope)$BIC)
## Warning in AIC.default(autism_lme_id, autism_lme_id_slope, autism_lme_quad_id, :
## models are not all fitted to the same number of observations
## Warning in BIC.default(autism_lme_id, autism_lme_id_slope, autism_lme_quad_id, :
## models are not all fitted to the same number of observations
autism_lme_ic %>% kbl() %>% 
  kable_classic(full_width = F) %>% 
  column_spec(3, color = c("black", "red")[as.numeric(autism_lme_ic$AIC == min(autism_lme_ic$AIC)) + 1]) %>% 
  column_spec(4, color = c("black", "red")[as.numeric(autism_lme_ic$BIC == min(autism_lme_ic$BIC)) + 1])
df AIC BIC
autism_lme_id 8 5461.844 5497.072
autism_lme_id_slope 10 4716.130 4760.166
autism_lme_quad_id 11 5470.017 5518.402
autism_lme_quad_id_slope1 13 4724.230 4781.412
autism_lme_quad_id_slope2 16 4640.705 4711.083
autism_lmer_quad_id_slope2 16 4639.983 4710.598
autism_lme_quad_id_slope2_nocor 13 4680.454 4737.636
autism_lmer_quad_id_slope2_nocor 13 4680.391 4737.766
autism_lme_quad_id_slope2_noint 13 4680.009 4737.191
autism_lmer_quad_id_slope2_noint 13 4680.009 4737.384
autism_lme_quad_id_slope2_noint_nocor 12 4678.391 4731.174
autism_lme_stick_id 11 5454.106 5502.490
autism_lme_stick_id_slope 13 4748.713 4805.895

We’ve fit all these “intuitive” models, and we can see most of them give pretty intuitive predicted values for our dataset. They capture the main trends we’re after pretty well. Given that we’re using random coefficient models, we’ll probably rule out the models that only vary the intercept. It seems from these predictive plots that we at the very least need to be modeling complexity at the linear or quadratic level (or splines). We’ll look further into these models in the diagnostics, as well as the variance modeling in the diagnostics.

The information criteria seems to pick out the “autism_lme_quad_id_slope2”, which is the random effect model with random coefficients up to quadratic order, and unstructured correlation matrix in G. thought we remember there were some boundary warnings with that model, so removing the intercept for a model that is not near the boundary may be desired.

Diagnostics

Let’s look at the mean structure with standard residual plots first. We’ll just pick out a few plots to look at.

Quad

plot(autism_lme_quad_id_slope2, form = resid(.,type = "normalized") ~ age | sicdegp) # normalized

In order to study the variance covariance pattern of this, I’ll examine how closely it matches up with the unstructured estimate of the covariance. we can also look at the sample covariance for some direction.

autism_gls_quad_un <- gls(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       correlation = corSymm(form = ~1 | childid),
                       weights = varIdent(form = ~ 1 | age),
                       data = autism_complete)

getVarCov(autism_gls_quad_un)
## Marginal variance covariance matrix
##         [,1]     [,2]     [,3]     [,4]     [,5]
## [1,] 10.8700   9.0082   9.3212   25.916   48.667
## [2,]  9.0082  55.7270  45.7570  110.010  201.300
## [3,]  9.3212  45.7570 135.7700  222.120  397.840
## [4,] 25.9160 110.0100 222.1200  771.040 1197.400
## [5,] 48.6670 201.3000 397.8400 1197.400 2287.700
##   Standard Deviations: 3.297 7.4651 11.652 27.768 47.83
getVarCov(autism_lme_quad_id_slope2, type = "marginal") # ZGZ + R
## childid 1 
## Marginal variance covariance matrix
##         1        2        3        4        5
## 1 39.0350   2.0468   5.4054   21.354   49.612
## 2  2.0468  53.4340  43.3380   96.860  148.630
## 3  5.4054  43.3380 156.7100  270.900  422.580
## 4 21.3540  96.8600 270.9000  748.630 1274.200
## 5 49.6120 148.6300 422.5800 1274.200 2568.100
##   Standard Deviations: 6.2478 7.3098 12.518 27.361 50.677
# we can also look at the sample covariance to get a rough sense for how the variances and covariances should be behaving

It seems the variance pattern in the random effect model is severely overestimating the variance at age 2. There is also an interesting pattern in which it seems to underestimate the covariance between age 2 and 3. Other than that, the covariance looks to be within reason of capturing the overall trends. there are likely some optimizations in parameterization that can be made, but it’s difficult to know exactly how…

Quad simplified, no intercept

plot(autism_lme_quad_id_slope2_noint_nocor, form = resid(.,type = "normalized") ~ age | sicdegp) # normalized

getVarCov(autism_gls_quad_un)
## Marginal variance covariance matrix
##         [,1]     [,2]     [,3]     [,4]     [,5]
## [1,] 10.8700   9.0082   9.3212   25.916   48.667
## [2,]  9.0082  55.7270  45.7570  110.010  201.300
## [3,]  9.3212  45.7570 135.7700  222.120  397.840
## [4,] 25.9160 110.0100 222.1200  771.040 1197.400
## [5,] 48.6670 201.3000 397.8400 1197.400 2287.700
##   Standard Deviations: 3.297 7.4651 11.652 27.768 47.83
getVarCov(autism_lme_quad_id_slope2_noint_nocor, type = "marginal") # ZGZ + R
## childid 1 
## Marginal variance covariance matrix
##        1       2       3        4        5
## 1 55.373  11.014  21.709   51.146   91.311
## 2 11.014  67.485  38.850   97.086  179.460
## 3 21.709  38.850 134.400  229.700  440.750
## 4 51.146  97.086 229.700  706.560 1303.300
## 5 91.311 179.460 440.750 1303.300 2667.700
##   Standard Deviations: 7.4413 8.2149 11.593 26.581 51.65

The model with the simplified G structure is also lacking in the earlier ages, and also the covariances with age2 seems to be increasing too quickly. The second diagonal band is also both over and underestimated sometimes.

Further covariance adjustments

The mixed effects model already has some complexity in the variance that is modeled, but it’s missing some parts of the covariance that we can try to adjust for on the R side of things. I find that there aren’t many guardrails when modeling things in this manner, so likelihood is generally my guide. You can try to create plots to clue you in on certain patterns but often it’s just faster to fit a bunch of parameterizations and check the results.

For the diagonal of the R matrix, it’s useful to know how some of the variance classes can be combined and to know what your options are, you can do this by exploring ?varClasses.

  • varIdent allows for a different level for each variance on the diagonal.
  • varExp is a (fitted) exponential relationship to covariate
  • varPower is an (fitted) power relationship
  • varFixed allows for a constant (fixed) covariate value
  • varComb allows combinations of any of the above

For the structure of the R matrix, you can see ?corStructs

  • corAR1 allows for exponential decay in rows, as measured from distance from diagonal
  • corCAR1 allows for exponential decay in rows, as measured from distance in continuous covariate
  • corARMA allows for exponential decay in rows, as distance from diagonal, AND first q diagonal bands
  • corCompSymm constant off diagonals

For the structure of the G matrix, there are also a number of spatial related matrices, which have a functional form of how correlation drops off in relation to distance.

  • pdDiag is useful for specifying that you only want a diagonal matrix for G. This is the (1 + age || childid) double bar option in lmer
  • pdSymm specifies that you want to esetimate an unstructured matrix for G. This is (1 + age | childid) default option in lmer.
  • pdBlocked is useful for composing matrix structures for nested effects in the G matrix.
  • pdCompSymm compound symmetry in G matrix. Only possible with flexLambda branch in lmer.
# This first model tries to add a parameter to adjust the variance of age=2, since the diagnostics above were close except for this term.
autism_lme_quad_id_slope2_het2 <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                                     random = list(childid = pdSymm(form = ~ 1 + age + I(age^2))), # random linear and quadratic term, no intercept
                       data = autism_complete,
                       weights = varIdent(form = ~ 1 | I(age == 2)),
                       control = lmeControl(opt = "optim", maxIter = 1000, msMaxIter = 1000, msVerbose = TRUE)) # unfortunately it's a fight with the optimizer, so we need to simplify.
## initial  value 3387.370642
## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 1
## Error in logLik.reStruct(object, conLin): NA/NaN/Inf in foreign function call (arg 3)

# try removing the intercept, and fit simple model
autism_lme_quad_id_slope2_noint_het2 <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                                     random = list(childid = pdSymm(form = ~ 0 + age + I(age^2))), # random linear and quadratic term, no intercept
                       data = autism_complete,
                       weights = varIdent(form = ~ 1 | I(age == 2)),
                       control = lmeControl(opt = "optim", maxIter = 1000, msMaxIter = 1000, msVerbose = TRUE)) # converged w/ warnings
## initial  value 3397.116545
## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 1
## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 47
## iter  10 value 3324.372416
## final  value 3324.349202 
## converged

# the more flexible diagonal values
autism_lme_quad_id_slope2_noint_het <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                                     random = list(childid = pdSymm(form = ~ 0 + age + I(age^2))), # random linear and quadratic term, no intercept
                       data = autism_complete,
                       weights = varIdent(form = ~ 1 | age),
                       control = lmeControl(opt = "optim")) # converged w/ warnings
## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 1

## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 47

## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 47

# we'll try more heterogenous matrices, but with the simplified G matrix and without intercept to avoid optimizer issues
autism_lme_quad_id_slope2_noint_nocor_het <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                                     random = list(childid = pdDiag(form = ~ 0 + age + I(age^2))), # random linear and quadratic term, no intercept
                       data = autism_complete,
                       weights = varIdent(form = ~ 1 | age),
                       control = lmeControl(opt = "optim"))
## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 1
## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 4

autism_lme_quad_id_slope2_noint_nocor_het2 <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                                     random = list(childid = pdDiag(form = ~0 + age + I(age^2))), # random linear and quadratic term, no intercept
                       data = autism_complete,
                       weights = varIdent(form = ~ 1 | I(age == 2)),
                       control = lmeControl(opt = "optim"))

# sincethe model onl
autism_lme_quad_id_slope2_noint_nocor_het2 <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                                     random = list(childid = pdDiag(form = ~ 0 + age + I(age^2))), # random linear and quadratic term, no intercept
                       data = autism_complete,
                       weights = varIdent(form = ~ 1 | age),
                       control = lmeControl(opt = "optim"))
## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 1

## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 4

autism_lme_quad_id_slope2_noint_nocor_ar1 <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                                    random = list(childid = pdDiag(form = ~ 0 + age + I(age^2))), # random linear and quadratic term, no intercept
                       correlation = corAR1(form = ~ 1 | childid),
                       data = autism_complete,
                       weights = varIdent(form = ~ 1 | age),
                       control = lmeControl(opt = "optim"))
## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 1

## Warning in logLik.reStruct(object, conLin): Singular precision matrix in level
## -1, block 4

autism_lme_quad_id_slope2_noint_nocor_ma1 <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       random = list(childid = pdDiag(form = ~ 0 + age + I(age^2))), # random linear and quadratic term, no intercept
                       correlation = corARMA(p = 0, q = 1),
                       data = autism_complete,
                       weights = varIdent(form = ~ 1 | age))

autism_lme_quad_id_slope2_noint_nocor_ma2 <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       random = list(childid = pdDiag(form = ~ 0 + age + I(age^2))), # random linear and quadratic term, no intercept
                       correlation = corARMA(p = 0, q = 2),
                       data = autism_complete,
                       weights = varIdent(form = ~ 1 | age))

autism_lme_quad_id_slope2_noint_nocor_ar1ma2 <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       random = list(childid = pdDiag(form = ~ 0 + age + I(age^2))), # random linear and quadratic term, no intercept
                       correlation = corARMA(p = 1, q = 2),
                       data = autism_complete,
                       weights = varIdent(form = ~ 1 | age))

autism_lme_quad_id_slope2_noint_nocor_car1 <- lme(vsae ~ age*sicdegp + I(age^2):sicdegp,
                       random = list(childid = pdDiag(form = ~ 0 + age + I(age^2))), # random linear and quadratic term, no intercept
                       correlation = corCAR1(form = ~ 1 | childid),
                       data = autism_complete,
                       weights = varIdent(form = ~ 1 | age))
autism_lme_cor_ic <- AIC(
  autism_lmer_quad_id_slope2,
  autism_lme_quad_id_slope2_noint_het,
  autism_lme_quad_id_slope2_noint_het2,
  autism_lme_quad_id_slope2_noint_nocor,
  autism_lme_quad_id_slope2_noint_nocor_het,
  autism_lme_quad_id_slope2_noint_nocor_het2,
     autism_lme_quad_id_slope2_noint_nocor_ma1,
     autism_lme_quad_id_slope2_noint_nocor_ma2,
     autism_lme_quad_id_slope2_noint_nocor_ar1,
  autism_lme_quad_id_slope2_noint_nocor_ar1ma2,
     autism_lme_quad_id_slope2_noint_nocor_car1) %>% 
  add_column(BIC = BIC(
  autism_lmer_quad_id_slope2,
  autism_lme_quad_id_slope2_noint_het,
  autism_lme_quad_id_slope2_noint_het2,
    autism_lme_quad_id_slope2_noint_nocor,
    autism_lme_quad_id_slope2_noint_nocor_het,
  autism_lme_quad_id_slope2_noint_nocor_het2,
    autism_lme_quad_id_slope2_noint_nocor_ma1,
    autism_lme_quad_id_slope2_noint_nocor_ma2,
    autism_lme_quad_id_slope2_noint_nocor_ar1,
    autism_lme_quad_id_slope2_noint_nocor_ar1ma2,
    autism_lme_quad_id_slope2_noint_nocor_car1)$BIC)
## Warning in AIC.default(autism_lmer_quad_id_slope2,
## autism_lme_quad_id_slope2_noint_het, : models are not all fitted to the same
## number of observations
## Warning in BIC.default(autism_lmer_quad_id_slope2,
## autism_lme_quad_id_slope2_noint_het, : models are not all fitted to the same
## number of observations

autism_lme_cor_ic %>% kbl() %>% 
  kable_classic(full_width=F) %>% 
  column_spec(3, color = c("black", "red")[as.numeric(autism_lme_cor_ic$AIC == min(autism_lme_cor_ic$AIC)) + 1]) %>%
  column_spec(4, color = c("black", "red")[as.numeric(autism_lme_cor_ic$BIC == min(autism_lme_cor_ic$BIC)) + 1])
df AIC BIC
autism_lmer_quad_id_slope2 16 4639.983 4710.598
autism_lme_quad_id_slope2_noint_het 17 4484.389 4559.165
autism_lme_quad_id_slope2_noint_het2 14 4536.707 4598.287
autism_lme_quad_id_slope2_noint_nocor 12 4678.391 4731.174
autism_lme_quad_id_slope2_noint_nocor_het 16 4482.832 4553.209
autism_lme_quad_id_slope2_noint_nocor_het2 16 4482.832 4553.209
autism_lme_quad_id_slope2_noint_nocor_ma1 17 4483.408 4558.184
autism_lme_quad_id_slope2_noint_nocor_ma2 18 4484.452 4563.627
autism_lme_quad_id_slope2_noint_nocor_ar1 17 4483.537 4558.313
autism_lme_quad_id_slope2_noint_nocor_ar1ma2 19 4486.409 4569.982
autism_lme_quad_id_slope2_noint_nocor_car1 17 4483.531 4558.307

It seems like our efforts paid off by improving the model, but some of the parameterizations did not improve the fit as much as we may have been hoping. Many are quite close, so I honestly believe we’ve done most of the corrections appropriate for this model and we’re fighting for scraps now, at least with these modeling techniques. The best fit by both AIC and BIC (among what we tried) is the simplified G model with heterogeneous structure in R.

If you want more formal quantification of model comparisons, use the likelihood ratio test.

anova(autism_lme_quad_id_slope2_noint_nocor, autism_lme_quad_id_slope2_noint_nocor_het)
##                                           Model df      AIC      BIC    logLik
## autism_lme_quad_id_slope2_noint_nocor         1 12 4678.391 4731.174 -2327.196
## autism_lme_quad_id_slope2_noint_nocor_het     2 16 4482.832 4553.209 -2225.416
##                                             Test  L.Ratio p-value
## autism_lme_quad_id_slope2_noint_nocor                            
## autism_lme_quad_id_slope2_noint_nocor_het 1 vs 2 203.5595  <.0001

Final Model

We have probably done enough modeling to come up with a final model, we’ll use the best fit from the last iteration of improvements, though our final model that was selected from AIC/BIC still has some singularity issues and troubles with the optimizer. We’ll have to spend some extra effort to track down why that’s happening.

getVarCov(autism_gls_quad_un)
## Marginal variance covariance matrix
##         [,1]     [,2]     [,3]     [,4]     [,5]
## [1,] 10.8700   9.0082   9.3212   25.916   48.667
## [2,]  9.0082  55.7270  45.7570  110.010  201.300
## [3,]  9.3212  45.7570 135.7700  222.120  397.840
## [4,] 25.9160 110.0100 222.1200  771.040 1197.400
## [5,] 48.6670 201.3000 397.8400 1197.400 2287.700
##   Standard Deviations: 3.297 7.4651 11.652 27.768 47.83
getVarCov(autism_lme_quad_id_slope2_noint_nocor_het, type = "marginal") # seems to reflect the pattern in covariance quite well.
## childid 1 
## Marginal variance covariance matrix
##         1        2       3        4        5
## 1 12.1870   6.9707  14.602   37.024   68.995
## 2  6.9707  47.3600  27.497   73.663  141.310
## 3 14.6020  27.4970 128.690  183.190  361.580
## 4 37.0240  73.6630 183.190  752.910 1104.700
## 5 68.9950 141.3100 361.580 1104.700 2252.800
##   Standard Deviations: 3.4909 6.8819 11.344 27.439 47.464
plot(autism_lme_quad_id_slope2_noint_nocor_het, form = resid(., type = "normalized")~fitted(.) | sicdegp, ylim = c(-8, 8))

intervals(autism_lme_quad_id_slope2_noint_nocor_het)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                         lower        est.     upper
## (Intercept)       -2.25946345  0.77647542 3.8124143
## age                1.68325878  3.26560911 4.8479594
## sicdegp2          -2.31980385  1.70808682 5.7359775
## sicdegp3          -7.07004509 -2.63431483 1.8014154
## age:sicdegp2      -2.17606966 -0.09458308 1.9869035
## age:sicdegp3       1.65369494  3.93648717 6.2192794
## sicdegp1:I(age^2) -0.15892339 -0.01247629 0.1339708
## sicdegp2:I(age^2) -0.08121223  0.04156100 0.1643342
## sicdegp3:I(age^2) -0.13239419  0.01868912 0.1697724
## 
##  Random Effects:
##   Level: childid 
##                  lower      est.     upper
## sd(age)      0.5372926 0.8451180 1.3293025
## sd(I(age^2)) 0.2364546 0.2731189 0.3154684
## 
##  Variance function:
##          lower      est.       upper
## 3  1.713870157 2.0708134    2.502096
## 5  2.243430178 2.8094052    3.518165
## 9  4.001851151 5.0275041    6.316026
## 13 0.000146793 0.4443621 1345.144133
## 
##  Within-group standard error:
##    lower     est.    upper 
## 2.460852 2.852393 3.306230

it seems like the variance function confidence interval for age is quite wide, which hints to where the singularity the optimizer was warning about.

SAS Modeling

data autism;
INFILE "~/cals/training/autism.csv" DLM="," FIRSTOBS=2;
INPUT age vsae sicdegp childid;
agef = age;

/* Basic plotting */
proc sgpanel data=autism;
panelby sicdegp;
SERIES x=age y=vsae / group=childid;

/* R Side Modeling */
proc mixed data=autism IC plots=all method=REML;
    class sicdegp childid;
    model vsae=age|sicdegp / ddfm=kr2 solution OutPM=RanPAFit OutP=RanSSFit;
    repeated / type=un subject=childid rcorr R;
    ods output covparms = cov;
    
/* Visualize UN Covariance Matrix */
data times;
    do time1=1 to 5;
        do time2=1 to time1;
            output;
        end;
    end;

data covplot;
    merge times cov;
run;

proc sgplot data=covplot;
    title "Unstructured Covariance Estimates Line Plot";
    series x=time1 y=estimate / group=time2;

    
/* helper for modeling many R matrices */
%MACRO IC_cov (Rtype=UN);
    %let tablename=IC_&Rtype;
    %let modelname=Model_&Rtype;
    
    * convert types to usable string in repeated statement;
    %if &Rtype=ANTE %then %let Rtypename=%str(ANTE(1));
    %else %if &Rtype=AR %then %let Rtypename=%str(AR(1));
    %else %if &Rtype=ARH %then %let Rtypename=%str(ARH(1));
    %else %if &Rtype=SPPOW %then %let Rtypename=%str(SP(POW)(age));
    %else %let Rtypename=&Rtype;    
    /* %put &tablename; */
    /* %put &modelname; */
    
    proc mixed data=autism IC plots=all method=REML;
    class sicdegp childid;
    model vsae=age|sicdegp / ddfm=kr2;
    repeated / type=&Rtypename subject=childid rcorr R;
    ods output InfoCrit=&tablename;
    
    data &modelname;
        length Name $ 10;
        set &tablename;
        Name="&Rtype";
        keep Name Parms Neg2LogLike AIC AICC BIC CAIC HQIC;
%MEND IC_cov;

/* test different covariance structures */
%IC_cov(Rtype=UN)
%IC_cov(Rtype=CS)
%IC_cov(Rtype=ANTE);
%IC_cov(Rtype=AR);
%IC_cov(Rtype=ARH);
%IC_cov(Rtype=TOEP);
%IC_cov(Rtype=TOEPH);
%IC_cov(Rtype=SPPOW); /* not sure how to combine power + diagonal */

/* combine IC results and show table */
data model_combine;
    set Model_UN Model_CS Model_ANTE Model_AR Model_ARH Model_TOEP Model_TOEPH Model_SPPOW;
    keep Name Parms Neg2LogLike AIC AICC BIC CAIC HQIC;
    
proc print data=model_combine;

proc mixed data=autism;
class sicdegp childid;
model vsae=age|sicdegp age*age|sicdegp / ddfm=kr2;
repeated / type=UN subject=childid R;

/* G Side Modeling */
proc mixed data=autism;
class sicdegp childid;
model vsae=age|sicdegp / ddfm=kr2;
random age age*age / type=UN subject=childid G;
repeated / subject=childid R;

/* helper for modeling many G and R matrices */
%MACRO IC_G_cov (Rtype=VC, Gtype=VC, id=VCVC);
    %let tablename=IC_&id;
    %let modelname=Model_&id;
    
    * convert types to usable string in repeated statement;
    %if &Rtype=ANTE %then %let Rtypename=%str(ANTE(1));
    %else %if &Rtype=AR %then %let Rtypename=%str(AR(1));
    %else %if &Rtype=ARH %then %let Rtypename=%str(ARH(1));
    %else %if &Rtype=SPPOW %then %let Rtypename=%str(SP(POW)(age));
    %else %let Rtypename=&Rtype;

    proc mixed data=autism IC plots=all method=REML;
    class sicdegp childid;
    model vsae=age|sicdegp age*age|sicdegp / ddfm=kr2;
    random int age age*age / type=&Gtype G;
    repeated / type=&Rtypename subject=childid rcorr R;
    ods output InfoCrit=&tablename;
    
    data &modelname;
        length Name $ 10;
        set &tablename;
        Name="&id";
        keep Name Parms Neg2LogLike AIC AICC BIC CAIC HQIC;
%MEND IC_G_cov;

%IC_G_cov(Rtype=VC, Gtype=UN, id=VC_UN);
%IC_G_cov(Rtype=VC, Gtype=VC, id=VC_VC);
/* %IC_G_cov(Rtype=CSH, Gtype=UN, id=CSH_UN); * optimizer issues; */

data model_G_combine;
    set Model_VC_UN Model_VC_VC;
    keep Name Parms Neg2LogLike AIC AICC BIC CAIC HQIC;
    
/* final model from R, no int, no correlation, SAS has trouble fitting */
/* proc mixed data=autism IC plots=all method=REML; */
/*  class sicdegp childid; */
/*  model vsae=age|sicdegp age*age|sicdegp / ddfm=kr2; */
/*  random age age*age / type=UN(1) G; */
/*  repeated / type=UN(1) subject=childid R; */

HTML SAS Output

Other thoughts: Mixed modeling is smart averaging to the pooled model.

Here we’ll compare the

  1. individual fixed model (lm fit for each child separately)
  2. pooled fixed model
  3. mixed model (with various assumptions on random effects)

This idea has come up a number of times, and I’ve mentioned this plot a few times so I thought I’d actually just show the effect that’s happening with this dataset.

This will also offer some extra intuition as to how the structure of the G matrix is affecting the effects of the individuals

\[ \begin{aligned} E[u | y] &= GZ'V^{-1}(y - X\beta) \\ eBLUP(u) &= \hat GZ'\hat V^{-1}(y - X\hat\beta) \end{aligned} \]

G effect on BLUPs

Let’s just look at panel 1 for simplicity of visualization.

# the pooled model
autism_lm_1 <- lm(vsae ~ age, data = autism_complete %>% filter(sicdegp == 1)) # the pooled model

pooled_data <- data.frame(as.list(coef(autism_lm_1)[c(1, 2 )])) %>% 
  rename("intercept" = 1,
         "slope" = "age") %>% 
  add_column(model = "pooled")
# the individual model
autism_lm_id_slope_1 <- lmList(vsae~age | childid, data = autism_complete %>% filter(sicdegp == 1)) # individual slopes model, just easier to extract coefs with this function

# sig1_children <- autism_complete %>% filter(sicdegp == 1) %>% distinct(childid) %>% pull(childid) 
indiv_data <- coef(autism_lm_id_slope_1) %>% 
  rownames_to_column(var = "child_id") %>% 
  rename("intercept" = 2,
         "slope" = 3) %>% 
  add_column(model = "indiv", .after = "child_id")
# the mixed models
# G matrix affect shape of pull for "BLUPs"
# G symm
autism_lme_id_slope_1_un <-  lme(vsae ~ age,
                              random = list(childid = pdSymm(form = ~age)),
                              data =  autism_complete %>% filter(sicdegp == 1),
                              control = lmeControl(opt = "optim"))

# G diag, het
autism_lme_id_slope_1_diag <-  lme(vsae ~ age,
                              random = list(childid = pdDiag(form = ~age)),
                              data =  autism_complete %>% filter(sicdegp == 1),
                              control = lmeControl(opt = "optim"))

# G diag, homo
autism_lme_id_slope_1_ident <- lme(vsae ~ age,
                                   random = list(childid = pdIdent(form = ~age)),
                                   data =  autism_complete %>% filter(sicdegp == 1),
                                   control = lmeControl(opt = "optim"))

# G CS, homo
autism_lme_id_slope_1_cs <- lme(vsae ~ age,
                                   random = list(childid = pdCompSymm(form = ~age)),
                                   data =  autism_complete %>% filter(sicdegp == 1),
                                   control = lmeControl(opt = "optim"))


# list of lme models
lme_models <-  list(mixed_un = autism_lme_id_slope_1_un, 
                   mixed_diag = autism_lme_id_slope_1_diag,
                   mixed_ident = autism_lme_id_slope_1_ident,
                   mixed_cs = autism_lme_id_slope_1_cs)

# mixed, fixed effects
mixed_data <- lme_models %>%  
  map_dfr(~data.frame(as.list(fixed.effects(.x))), # extract fixed effects as data frame
          .id = "model") %>% 
  rename("intercept" = 2,
         "slope" = 3)

# helper function for extracting individual lines
get_indiv_lines <- function(lme_object) {
  coef(lme_object) %>% as.matrix() %>% 
    as.data.frame() %>% 
    rownames_to_column("child_id") %>% 
    select("child_id", `(Intercept)`, `age`) %>% 
    rename("intercept" = 2,
           "slope" = 3)
}

# mixed model individuals
mixed_indiv_data <- lme_models %>% map_dfr(~get_indiv_lines(.x),
                       .id = "model")
combined_indiv_data <- bind_rows(indiv_data, mixed_indiv_data)
combined_avg_data <- combined_indiv_data %>%
  group_by(model) %>%
  summarize(across(intercept:slope, mean, na.rm = T))
indiv_avg_data <- indiv_data %>% group_by(model) %>%
  summarize(across(intercept:slope, mean, na.rm = T))

We’ve done all the fitting, and data extraction so now we’re ready to plot and create dataframes for plotting

# hack for each specifying what data goes to which "facet" of ggplot
# needs to duplicate data that will appear in multiple facets.
mixed_data_facet <- mixed_data %>% 
  mutate(facet_id = c("mixed_un" = 1, "mixed_ident" = 2, "mixed_diag" = 3, "mixed_cs" = 4)[model])
mixed_indiv_data_facet <- mixed_indiv_data %>%
  mutate(facet_id = c("mixed_un" = 1, "mixed_ident" = 2, "mixed_diag" = 3, "mixed_cs" = 4)[model])
indiv_data_facet <-  1:4 %>% map_dfr(~add_column(indiv_data, facet_id = .)) # 4 copies of individual data, one for each facet
combined_indiv_data_facet <- bind_rows(mixed_indiv_data_facet, indiv_data_facet) %>% arrange(model)

# ellipse level curves of G
mixed_ellipses <- lme_models %>% 
  map_dfr(
    function(lme_obj) {
      G_hat <- getVarCov(lme_obj)
      mu <- fixed.effects(lme_obj) 
      ellipse(G_hat, centre = mu) %>% 
        as_tibble(.name_repair = ~c("intercept", "slope"))
    },
    .id = "model") %>% 
  mutate(facet_id = c("mixed_un" = 1, "mixed_ident" = 2, "mixed_diag" = 3, "mixed_cs" = 4)[model]) # specify which facet for each ellipse 
ggplot(data = combined_indiv_data_facet) +
  geom_path(mapping = aes(intercept, slope, group = child_id), # fixed -> mixed
            arrow = arrow(ends = "last", length = unit(.1, "cm")),
            alpha =.7) +
  geom_point(mapping = aes(intercept, slope, color = model, alpha = model), # mixed estimates
             pch = 16) + 
  geom_point(mapping = aes(intercept, slope, color = model, alpha = model), # mixed center
             data = mixed_data_facet, size = 8, shape = 4) +
  geom_path(mapping = aes(intercept, slope, color = model), # G level ellipses
            data = mixed_ellipses,
            linetype = 2,
            alpha = .4) + 
  facet_wrap(~facet_id) +
  scale_color_manual(values = c("#000000", "#E69F00", "#009E73", "#0072B2", "#CC79A7")) +
  scale_alpha_manual(values = c(.4, rep(.6, 4))) +
  labs(title = "G structure effect on mixed effect estimation")
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_point).

# show G hats for each of the models
lme_models %>% map(getVarCov)
## $mixed_un
## Random effects variance covariance matrix
##             (Intercept)     age
## (Intercept)      47.664 -24.470
## age             -24.470  12.565
##   Standard Deviations: 6.9039 3.5447 
## 
## $mixed_diag
## Random effects variance covariance matrix
##             (Intercept)    age
## (Intercept)    0.013694 0.0000
## age            0.000000 6.4369
##   Standard Deviations: 0.11702 2.5371 
## 
## $mixed_ident
## Random effects variance covariance matrix
##             (Intercept)    age
## (Intercept)      6.4617 0.0000
## age              0.0000 6.4617
##   Standard Deviations: 2.542 2.542 
## 
## $mixed_cs
## Random effects variance covariance matrix
##             (Intercept)     age
## (Intercept)      8.9804 -8.9780
## age             -8.9780  8.9804
##   Standard Deviations: 2.9967 2.9967
g_mu <- ggplot(mapping = aes(intercept, slope, color = model, alpha = model)) +
  geom_point(data = indiv_data, pch = 16) + 
  geom_point(data = mixed_data_facet, shape = 4, size = 4) +
  geom_point(data = pooled_data, shape = 4, size = 4) + 
  scale_color_manual(values = c("#000000", "#E69F00", "#009E73", "#0072B2", "#CC79A7", "347827")) +
  scale_alpha_manual(values = c(.4, rep(.6, 5)))
g_mu_zoom <- g_mu + coord_cartesian(xlim = c(1,3), ylim = c(2,3.5)) +
  theme(legend.position = "none")

vp_zoom <- viewport(width = .3, height = .3, x = .67, y = .85)

print(g_mu)
## Warning: Removed 1 rows containing missing values (geom_point).
print(g_mu_zoom, vp = vp_zoom)
## Warning: Removed 1 rows containing missing values (geom_point).

R effect on BLUPs

This section we’ll play with the estimates through “R” a little, and see how that affects the BLUPs. This should affect the BLUPs by changing the estimate of “V” in the equation:

\[ \begin{aligned} E[u | y] &= GZ'V^{-1}(y - X\beta) \\ eBLUP(u) &= \hat GZ'\hat V^{-1}(y - X\hat\beta) \end{aligned} \]

# heterogenous
autism_lme_id_slope_1_ident_het <- lme(vsae ~ age,
                                   random = list(childid = pdIdent(form = ~age)),
                                   data =  autism_complete %>% filter(sicdegp == 1),
                                   weights = varIdent(form = ~ age),
                                   control = lmeControl(opt = "optim"))

autism_lme_id_slope_1_ident_cs <- lme(vsae ~ age,
                                   random = list(childid = pdIdent(form = ~age)),
                                   data =  autism_complete %>% filter(sicdegp == 1),
                                   correlation = corCompSymm(form = ~ 1 | childid),
                                   control = lmeControl(opt = "optim"))

autism_lme_id_slope_1_ident_car <- lme(vsae ~ age,
                                   random = list(childid = pdIdent(form = ~age)),
                                   data =  autism_complete %>% filter(sicdegp == 1),
                                   correlation = corCAR1(form = ~ age | childid),
                                   weights = varIdent(form = ~ age),
                                   control = lmeControl(opt = "optim"))

autism_lme_id_slope_1_ident_un <- lme(vsae ~ age,
                                   random = list(childid = pdIdent(form = ~age)),
                                   data =  autism_complete %>% filter(sicdegp == 1),
                                   correlation = corSymm(form = ~ 1 | childid),
                                   weights = varIdent(form = ~ age),
                                   control = lmeControl(opt = "optim"))
lme_r_models <- list(mixed_r_ident = autism_lme_id_slope_1_ident,
                     mixed_r_un = autism_lme_id_slope_1_ident_un,
                     mixed_r_cs = autism_lme_id_slope_1_ident_cs,
                     mixed_r_car = autism_lme_id_slope_1_ident_car)

mixed_r_indiv_data <- lme_r_models %>% map_dfr(~get_indiv_lines(.x),
                       .id = "model")
# plotting dfs
# for faceting
# can use same one
# indiv_data_facet <-  1:4 %>% map_dfr(~add_column(indiv_data, facet_id = .)) # 4 copies of individual data, one for each facet
mixed_r_indiv_data_facet <- mixed_r_indiv_data %>% mutate(facet_id = c("mixed_r_ident" = 1, "mixed_r_un" = 2, "mixed_r_cs" = 3, "mixed_r_car" = 4)[model])
combined_r_indiv_data_facet <- bind_rows(mixed_r_indiv_data_facet, indiv_data_facet) %>% arrange(model)
combined_r_indiv_data_facet %>% 
  ggplot(aes(intercept, slope, color = model, alpha = model)) +
  geom_path(aes(intercept, slope, group = child_id),
            arrow = arrow(ends = "last", length = unit(.1, "cm")),
            color = "black",
            alpha =.7) +
  geom_point(pch = 16) +
  scale_color_manual(values = c("#000000", "#E69F00", "#009E73", "#0072B2", "#CC79A7")) +
  scale_alpha_manual(values = c(.4, rep(.7, 4))) +
  facet_wrap(~facet_id)
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_point).

lme_r_models %>% map(getVarCov, type = "marginal")
## $mixed_r_ident
## childid 2 
## Marginal variance covariance matrix
##         1       2       3      4       5
## 1  85.996  45.232  71.079 122.77  174.47
## 2  45.232 118.300 103.390 180.93  258.47
## 3  71.079 103.390 221.690 297.24  426.47
## 4 122.770 180.930 297.240 583.55  762.48
## 5 174.470 258.470 426.470 762.48 1152.20
##   Standard Deviations: 9.2734 10.877 14.889 24.157 33.944 
## 
## $mixed_r_un
## childid 2 
## Marginal variance covariance matrix
##         1      2      3      4       5
## 1  87.144  75.38 103.39 124.70  202.00
## 2  75.380 124.58 116.16 202.06  296.72
## 3 103.390 116.16 244.38 329.34  465.61
## 4 124.700 202.06 329.34 663.67  856.77
## 5 202.000 296.72 465.61 856.77 1322.50
##   Standard Deviations: 9.3351 11.162 15.633 25.762 36.367 
## 
## $mixed_r_cs
## childid 2 
## Marginal variance covariance matrix
##         1       2       3      4       5
## 1  76.107  28.296  51.154  96.87  142.59
## 2  28.296 104.680  79.726 148.30  216.88
## 3  51.154  79.726 196.110 251.16  365.45
## 4  96.870 148.300 251.160 516.13  662.61
## 5 142.590 216.880 365.450 662.61 1019.00
##   Standard Deviations: 8.724 10.231 14.004 22.718 31.922 
## 
## $mixed_r_car
## childid 2 
## Marginal variance covariance matrix
##         1       2       3      4       5
## 1  99.642  92.726  96.156 134.07  184.84
## 2  92.726 133.680 139.880 197.49  273.81
## 3  96.156 139.880 242.580 327.73  452.51
## 4 134.070 197.490 327.730 623.76  817.82
## 5 184.840 273.810 452.510 817.82 1222.80
##   Standard Deviations: 9.9821 11.562 15.575 24.975 34.968

Other thoughts: Fixed vs Random

Random effects represent this “middle ground” between fixed effects and error, so there are more decisions into which level effects fall under for modeling. Can you explain the differences between the following models? All of them loosely fall under the category of “fitting line to each child”.

autism_complete_3 <- autism_complete %>% filter(sicdegp == 3) # just worry about one panel of sicdegp for now

mod0 <- lm(vsae ~ age*childid, data = autism_complete_3)
mod1 <- lmer(vsae ~ (age | childid), data = autism_complete_3)
## boundary (singular) fit: see help('isSingular')
mod2 <- lmer(vsae ~ age + age:childid + (1 | childid), data = autism_complete_3)
mod3 <- lmer(vsae ~ childid + age + age:childid + (1 | childid), data = autism_complete_3)
## fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
mod4 <- lmer(vsae ~ age + (age | childid), data = autism_complete_3) # the one I would choose
## boundary (singular) fit: see help('isSingular')
mod5 <- lmer(vsae ~ childid + age +  (age | childid), data = autism_complete_3)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
mod6 <- lmer(vsae ~ age + age:childid + (age | childid), data = autism_complete_3)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
mod7 <- lmer(vsae ~ childid*age +  (age | childid), data = autism_complete_3)
## fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

All give slightly different predictions, but similar enough in terms of functional form and information that is modeled.

mod_yhat <- autism_complete_3 %>% bind_cols(
  yhat_mod0 = predict(mod0),
  yhat_mod1 = predict(mod1),
  yhat_mod2 = predict(mod2),
  yhat_mod3 = predict(mod3),
  yhat_mod4 = predict(mod4),
  yhat_mod5 = predict(mod5),
  yhat_mod6 = predict(mod6),
  yhat_mod7 = predict(mod7), # mostly same as 3
)

mod_yhat %>% select(starts_with("yhat")) %>% 
  head()
##   yhat_mod0 yhat_mod1 yhat_mod2 yhat_mod3 yhat_mod4 yhat_mod5 yhat_mod6
## 1  7.725962  15.58255  4.839585  7.725962  13.63232  6.220948  4.839452
## 2  9.742788  16.67122  7.268751  9.742788  15.08613  8.579824  7.268638
## 3 13.776442  18.84857 12.127084 13.776442  17.99374 13.297574 12.127008
## 4 21.843750  23.20327 21.843750 21.843750  23.80898 22.733076 21.843750
## 5 29.911058  27.55797 31.560416 29.911058  29.62421 32.168578 31.560492
## 6 15.028846  15.79317  6.937472 15.028846  13.72317 13.134900  6.937088
##   yhat_mod7
## 1  7.725962
## 2  9.742788
## 3 13.776442
## 4 21.843750
## 5 29.911058
## 6 15.028846

Note that the predictions are similar between some models, but enough differences that it’s probably not “rounding” error. The closest ones are 2,6 and 3,7.

mod_yhat %>% pivot_longer(cols = c(vsae, starts_with("yhat"))) %>% 
  ggplot(aes(age, value, group_id = childid, color = name)) +
  geom_line() +
  facet_grid(name~.)

When looking at the predictions at a whole glance though, they are mostly the same, capturing all of the main trends.

anova(mod1,
      mod2,
      mod3,
      mod4,
      mod5,
      mod6,
      mod7) %>% as.data.frame() %>% 
  rownames_to_column() %>% 
  arrange(rowname) %>% 
  select(-last_col(), -Chisq) # get rid of test columns
## refitting model(s) with ML (instead of REML)
##   rowname npar      AIC      BIC    logLik deviance Df
## 1    mod1    5 1399.561 1415.091 -694.7805 1389.561 NA
## 2    mod2   44 1312.159 1448.821 -612.0796 1224.159 38
## 3    mod3   83 1309.771 1567.564 -571.8854 1143.771 37
## 4    mod4    6 1355.623 1374.258 -671.8114 1343.623  1
## 5    mod5   46 1436.483 1579.357 -672.2417 1344.483  2
## 6    mod6   46 1311.218 1454.092 -609.6090 1219.218  0
## 7    mod7   85 1313.771 1577.776 -571.8854 1143.771  2

The number of parameters for each of these models is wildly different, and thus the information criteria. What’s going on?

I think the differences in modeling are quite a little nuanced, but largely I think of this as a reframing of the fixed vs random debate. The random coefficients just make this even more confusing to me in terms of separation of the two effects… Definitionally, treating something as fixed implies that you’re interested in making inferences on those topics, whereas treating something as random is accounting for population randomness in the population you want to make inferences about. In experimental designs, the split is somewhat easier, because you have “experimental” components of your design like treatments you wish to distinguish, which really should be fixed effects, and everything else about your experiment, like blocks and the units you randomize your treatments to. Walter Stroup advocates an ANOVA-esque approach to this, and apparently IMS bulletins discuss this approach as “What would Fisher Do?”, Stroup calls the blocks and units the “topographical features” and the treatments “treatment features”. I think Miliken and Johnson describe a similar split with “experiemental” and “design” components of your study.

At the end of the day, I consider these things:

  1. “inference spaces” - a matter of what population you are trying to make inferences about. What parts of the design do I truly wish to assign a probability distribution to?
  2. “borrow strength” - due to the shrinkage effect, if you have blocks with complete information, and blocks with very little information, you can recover a lot by making the assumption that blocks with little information are similar to blocks with complete information. I.e, missing data situations are something you’d naturally want to attach a probability distribution to in order to make better inferences.
  3. “variance implications” - modeling random effects carry forward implications about the covariance matrix of observations. Am I willing to accept that, do random effect assumptions capture population well? 2
  4. computation…

The models in which childid appears as a fixed effect AND a random effect grouping is a little unreasonable, as those fits will be fighting for the same information. Using age as both a fixed and random covariate seems appropriate because there will be populational growth rate I’m interested in studying, as well as variation in that among the children I’ve measured. That leaves model 4.

The “randomness I’m assuming in the population” can be more clearly visualized by just showing random effect components of the estimates. We can also plot the fitted values alongside them to show the split of fixed and random among all these models!

mod_random <- autism_complete_3 %>% bind_cols(
  # yhat_mod0 = predict(mod0), # not a random effect model
  yhat_mod1 = predict(mod1, random.only = T),
  yhat_mod2 = predict(mod2, random.only = T),
  yhat_mod3 = predict(mod3, random.only = T),
  yhat_mod4 = predict(mod4, random.only = T),
  yhat_mod5 = predict(mod5, random.only = T),
  yhat_mod6 = predict(mod6, random.only = T),
  yhat_mod7 = predict(mod7, random.only = T), # mostly same as 3
)

mod_fixed <- autism_complete_3 %>% bind_cols(
  # yhat_mod0 = predict(mod0), # not a random effect model
  yhat_mod1 = predict(mod1, re.form = NA),
  yhat_mod2 = predict(mod2, re.form = NA),
  yhat_mod3 = predict(mod3, re.form = NA),
  yhat_mod4 = predict(mod4, re.form = NA),
  yhat_mod5 = predict(mod5, re.form = NA),
  yhat_mod6 = predict(mod6, re.form = NA),
  yhat_mod7 = predict(mod7, re.form = NA), # mostly same as 3
)


mod_fixed_plot <- mod_fixed %>% pivot_longer(cols = c(starts_with("yhat"))) %>% 
  ggplot(aes(age, value, group_id = childid, color = name)) +
  geom_line() +
  facet_grid(name~.) + 
  labs(title = "Fixed")

mod_random_plot <- mod_random %>% pivot_longer(cols = c(starts_with("yhat"))) %>% 
  ggplot(aes(age, value, group_id = childid, color = name)) +
  geom_line() +
  facet_grid(name~.) +
  labs(title = "Random")

ggarrange(mod_fixed_plot, mod_random_plot, ncol = 2, common.legend = TRUE, legend = "bottom")

Models in which we overparameterize the fixed effects, we see no random variability. Since the whole point of using a random model is to capture the variability of the population, 2, 3, 6, 7 simply don’t reflect that distribution. The difference between 1 and 4 is simply whether or not the population randomness is centered around a population trend. In 4, they fan out around 0, because the fixed age accounts for the population level trend, whereas model 1 fans out above 0.